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ON OPTIMALITY PROPERTIES OF THE 
SHIRYAEV-ROBERTS PROCEDURE 

By Moshe Pollak!^ and Alexander G. Tartakovsk^JM 

, The Hebrew University of Jerusalem and the University of Southern 

' California 

(N ■ 

We consider the simple changepoint problem setting, where ob- 
^ ' servations are independent, iid pre-change and iid post-change, with 

I known pre- and post-change distributions. The Shiryaev-Roberts de- 

tection procedure is known to be asymptotically minimax in the sense 
of minimizing maximal expected detection delay subject to a bound 
on the average run length to false alarm, as the latter goes to infinity. 
Here we present other optimality properties of the Shiryaev-Roberts 
' procedure. 

I 1. Introduction. Changepoint problems deal with detecting a change 

^ ' in the state of a process, where information one has about the state of 

affairs is in the form of observations. In the sequential setting, observations 
are obtained sequentially and, as long as their behavior is consistent with 
the initial (or target) state, one is content to let the process continue. If the 
state changes, then one is interested in detecting that a change is in effect, 
. usually as soon as possible after its occurrence. 

I Any detection policy may give rise to false alarms. Intuitively, the desire 

to detect a change quickly causes one to be (relatively) trigger-happy, which 
will bring about many false alarms if there is no change. On the other hand, 
I attempting to avoid false alarms too strenuously will lead to a long delay 

■ between the time of occurrence of a real change and its detection. Common 

operating characteristics of a sequential detection policy are ARL2FA = 
^ \ the Average Run Length (the expected number of observations) to False 
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Alarm (assuming that there is no change) and the AD2D = Average Delay 
to Detection (the expected delay between a real change and its detection). 
The gist of the changepoint problem is to produce a detection policy that 
(at least approximately) minimizes the AD2D subject to a bound on the 
ARL2FA. The constitution of a good policy depends very much on what 
is known about the stochastic behavior of the observations, both pre- and 
post-change. 

Let Xi,X2, ■ ■ ■ denote the series of observations, and let v be the serial 
number of the first post-change observation. Let and denote probabil- 
ity and expectation when u = k, and let Pqo and Eqo denote the same when 
1^ = 00 (i.e., there never is a change). A sequential change detection proce- 
dure is identified with a stopping time N on Xi, X2, ■ ■ ■ , i.e., {A^ < n} € Tn, 
where !Fn = cr{Xi, . . . , Xn) is the sigma-algebra generated by the first n ob- 
servations. 

In this paper, we consider the simplest setting of the problem, where the 
observations are independent, each having density /o pre-change and density 
/i post-change, where both /o and /i are known, and only the value of i^, the 
point of change, is unknown. (In practice, often /o is known. Realistically, 
/i is not known, but the sim ple setting yields a benchmark for the best 
one can ho pe.) I n this setting, Moustakides ( IQSSl l proved that the Cusum 
procedure ( Pagel . 19541 ) is optimal in the sense of minimizing the worst-worst 
case (essential supremum) expected detection delay 

sup esssupEfc[(A^ — k)^\J^k] 
k>l 



over all stopping times for which 



(1) 



ARL2FA(A) 



EooA > B, 



where B > is a value set before the surveillance begins. See also iLorden 
(Il97lh and iRitovl (|l99d l. .?OT a continuous-time Brownian motion a similar 

result has been estabhshed bv lBeibell (Il99fil ) and lShirvae^ (Il99fil). 

Pollad (Il985l l proved that the Shiryaev- Roberts procedure ( Robertsl . 1966 : 



Shiryaev 



is asymptotically (as B 00) optimal in the sense of mini- 
mizing the supremum AD2D 

supEfc(A^ - k\N > k) 

k>l 

over all stopping times N that satisfy ([T]). 

Here we prove other (exact) optimality properties of the Shiryaev-Roberts 
detection procedure. To be specific, in Section [2l we prove that the Shiryaev- 
Roberts procedure is (exactly) optimal in the sense of minimizing the "inte- 
gral AD2D" = YlkLi Ea;(A - /c)+ for every i? > in the class of procedures 
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with the ARL2FA constraint ([T]). In Section [Sj we consider the setting where 
a change occurs in a distant future (i.e., v is large) and the detection of a 
change is preceded by a large number of false detections. We prove that the 
Shiryaev-Roberts procedure is the best one can do in terms of minimizing 
the expected detection delay asymptotically when v oo vn the class ([T]), 
for every > 0. 

Both problem settings have beer i previously considered f or a c o ntinuous- 
time B rownian motion model. See Feinberg and Shiryaev ( 20061 ): Shiryaev 



( 19631 ) and Remarks [T] and [3] below. 



2. Minimizing integral AD2D. Using the notation of the previous 
section, the Shiryaev-Roberts procedure calls for stopping and raising an 
alarm at 

(2) A^As =min{n> 1 > As}, 

where 

/ON p - v'K^^^---^^n\v = k) _ A fliXj) 

and Ab is such that EooA^Ab — ^■ 

Below in Theorem [1] we prove that the Shiryaev-Roberts procedure is ex- 
actly optimal in the sense of minimizing the integral AD2D = J^k^i Efc(A^ — 
k)^ in the class of detection procedures = {N : ARL2FA(A^) > B} in 
which the mean time to false alarm is not less than the given positive num- 
ber B. We begin with a sketch of the argument why one may expect this to 
be true. 

To this end, we first need to consider the following Bayesian problem, 
denoted by B{p,c). Suppose v is random and has a geometric prior distri- 
bution 

P{u = k) = p{l - p)''-\ k>l, 

and the losses associated with stopping at time A^ are 1 if A^ < and 
c ■ (A^ — u) if N > u, where < p < 1 and c > are fixed constants. Write 
P''(») = EfcLiP(l - p)''"^Pfc(») for the "average" probability and E'' for 
the corresponding expectation. 

Solution of B{p, c) requires minimization of the expected loss 

(4) ^,,p{N) = pPiN <u) + cEP{N-i^)+, 

and the B ayes rule fo r this problem is given by the Shiryaev procedure (cf. 
Shiryaev . 19631 . 19781 ). which is the stopping time 



(5) Tp^c = min {n > 1 : PP{u < n|.F„) > 6p^c} , 
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where < 6p^c < 1 is an appropriate threshold. 

Obviously, the B{p, c) problem is equivalent to maximizing 

1, , VP{N>v) WiN-u)+ 



P P P 

In the proof of Theorem [1] below, we show that, for any stopping time A^, 

oo 



EooiV, ^ ■^Y.^k{N-k)- 



k=l 

Hence 

1 oo 

-[1 - (^c,p(iV)] — > EooA^ - c Efc(iV - k)+, 

which should be maximized in the class A^. 

We also show that the Shiryaev procedure Tp^c converges to the Shiryaev- 
Roberts procedure as p ^ 0. Therefore, it stands to reason that the 
integral AD2D = Y.'k'=i Efc(iV - k)+ is minimized subject to EooA^ > B. 

Formal details are given in the following theorem and its proof. 

Theorem 1. Let Ab be chosen so that ARL2FA(A^^^) = B. Then the 
Shiryaev-Roberts procedure defined by ([2]) and ([3]) minimizes 

oo 

(6) Y.^k{N-kr 

k=l 

over all stopping times N that satisfy E^qN > B, i.e., 

oo oo 

inf Y.^k{N -k)+ = Y,Ek{NAs-^)^ for every B>0, 
^^^'^ k=i k=i 

where Ab = {N : ARL2FA(A^) > B}. 

Proof. Consider the Bayesian pr oblem B( p , c) w i th Ge ometric(p) prior 
distribution and the average loss (fl]). IShirvaevI (|l96.i Iwrk proved that the 
expected loss dH for the problem B{p, c) is minimized by the stopping time 
dl]). Applying Bayes' formula, it is easy to see that 



Rp,n + l//0' 

where 
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Hence, the Shiryaev rule can be written in the equivalent form 

(7) Tp^c = min {n > 1 : Rp^n > ^p,c} , 

where Ap^c = (l//9)[5p,c/(l - Sp,c)]- 

Note first that Rpn ^ Rn- 

r — J 1 

By Theorem 1 of IPollakI ( 19851 ). there exist a constant < c* < oo and a 
sequence {pi,Ci}^]^ with pi > 0, q > c* such that Na^ is the limit 

i—too i^oo 

of the Bayes rules Tp^^a as i —* oo. Furthermore, 

(8) limsup ' " V 1, 

where (pc,p{N) is the expected loss associated with using the stopping time 
for b(p,c). 

Now, for any stopping time A^, 

i[l - ^cA^)] = ^ [(1 - ^'{N < u)) - cE^(iV - 
PP(N > I/) 



P 



Since 



P Pk=i 



k=l 

EPoo(iV>fc)(l-p)'-^ 
k=l 

oo 



and 

PP[N > v)W{N - u\N >v)_ W[N -u;N>u) 



P 



^ oo 



-Y,'^k{N-k-N>k)p{l-p) 



k-l 



J2mN-k-N>k){l-p)^-^ 

k=l 

oo oo 

E BkiN -k;N>k) = J2 Efc(iV - k) 



k=i k=i 
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it follows that for any stopping time N that has finite ARL2FA 

1 oo 

-[1 - ^cpW] > EooA^ - c V Bk{N - k)+, 

which together with ([8]) establishes that the Shiryaev-Roberts procedure 
minimizes ^ over all stopping times that satisfy Eoo-ZV = B. Note that 
if Bi > B, then A''^^^ is stochastically larger than Nj^g, i.e., all expecta- 
tions in (l6|) become larger. This implies that the Shiryaev-Roberts procedure 
minimizes ([6]) in the class A^. This completes the proof of the theorem. □ 

Corollary 1. The Shiryaev-Roberts procedure defined by ([2|) and ([3]) 

minimizes 

Ek=imN-k\N>k)P^{N>k) 
Er=iPoo(iV>j) 

over all stopping times N that satisfy E^qN = B, i.e., 

oo oo 

inf V Wk{N)Ek{N-k\N >k) = J2 WkiNAs)EkiNAs-k\NAs > k), 

{N:E^N=B} fc=l 

where 

(An Poo(^>fe) 

"'^^^^ = Er.iPoo(iv>.) 

and the threshold As is selected so that Eqo-A^^ = B. 

Proof. Obviously, J2'^i Poo (A > j) = EqoA = B, so the denominator 
in Q is constant over all stopping times under consideration. As for the 
numerator, 

Efc(A - k\N > k)Poo{N >k)= Bk{N - k\N > k)Pk{N > k) 
^^^^ = Ek{N - k; N > k) = Ek{N - k)+ . 

Application of Theorem [1] concludes the proof. □ 

Remark 1. Recently, Feinberg and Shirvaev ( 20061 ) established a result 
similar to Theorem [T] for Brownian motion where an abrupt change occurs 
in the drift, in which case the integral AD2D is Ei^{N — u)'^dv. They 
refer to this as "A Generalized Bayesian Setting." 

While Theorem [1] and Corollary [1] are of interest in their own right, they 
are useful for proving other interesting optimality results, as it will become 
apparent in the next section. 
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3. Optimality for a change appearing after many re-runs. Con- 
sider a context in which it is of utmost importance to detect a real change 
as quickly as possible after its occurrence, even at the price of raising many 
false alarms (using a repeated application of the same stopping rule) before 
the change occurs. This essentially means that the changepoint v is very 
large compared to the constant B which, in this case, defines the mean time 
between consecutive false alarms. 

To be more specific, let -/vj^^, -/V^^^, ... be sequential independent repeti- 
tions of defined in (l2|), i.e.. 



(11) iV« = min n > ^ivO^ ^ > -Y.N% 



where N^\^ = and 



(12) 



^= E U¥^y E<<-<E^^^ 



Therefore, Rn^ , n > I]}=i ^^aI nothing but the Shiryaev-Roberts statistic 
that is renewed from scratch after the {i — l)st false alarm (under Poo) and 
is applied to the segment of data 

^T"~\ +V^T''] n['^ +2' • ■ ■ • 

Note that EooiVA^^ = S for i > 1. 
Let, for j > 1, 



be the time of the j-th alarm, and let J^, = min{j > 1 : Qj > z^}, i.e., Qj^ 
is the time of detection of a true change that occurs at u after — 1 false 
alarms have been raised. 

Our next theorem states that the Shiryaev-Roberts procedure defined by 
Qj^ is asymptotically (as v — > oo) optimal with respect to the expected 
delay 'E^^Qj^ — u) in the class of detection procedures for which the mean 
time between false alarms is not less than B. Note that this result is not 
asymptotic with respect to the ARL2FA. In fact, it holds for every positive 
B. 
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Theorem 2. Let v he the time of the change. Let N^^,N^^^, ... be se- 
quential independent repetitions of N^g as defined in (jlip and letQi, Q2, ■ ■ ■ 
be as in (fT3]) . Let Jy = mm{j : Qj > v}. 

(i) Ymiu^oo'^uiQj^ - exists. 

(a) Suppose a detection procedure N with ARL2FA(A^) = B is applied 
repeatedly. Let Ni,N2, ... be sequential repetitions of N, let Wj = J2i=i 
and let Kj^ = min{j : Wj > i^}. Then, for every B > 0, 

(14) lim E,{Qj^ -u)< lim E,{Wk^ - v). 

u — ^00 V — ^00 

(Hi) Inequality ([M]) holds for all N G A^, where Ab = {N : B^oN > B}. 

Proof. Proof of (i). By renewal theory, the distribution of — Qj^-i has 
a Umit 

(15) hm [v - Qj^-i = k) - 



ET=1^oo{Na, >j) 



(see, e.g.. iFellerj . 1 19661 . page 356). 

Using (fTSj) and letting Na^ be independent of , , . . . , we obtain 

E.(Qj. - z.) = E,E, (Qj, - i/|Qj,-i) 

V 

= ^ Efc (iV^B - k\v - Qj,^^i = k, Nas > k) Poo {y - Qj^-i = k) 

k=l 

V 

= 5] Efe {Na^ - k\NA^ > k) Poo {v - Qj^.i = k) 

k=l 

Er=i Efc {Nas - k\NAs > k) Poo {Nab > k) 



ET=l^oo{NAs>j) 

Er=i Efc(iVA, - k)+ _ Er=i Efc(iVA, - k)' 



EooiVAs B 

which completes the proof of (i). 

Proof of (ii). The same argument as in the proof of (i) yields 



B 

Combining this with Corollary [T] concludes the proof. 

Proof of (iii). Write AD2D(B) = lim^_oo E^(Qj,^ -z/) for the AD2D of the 
Shiryaev-Roberts procedure NAg. Note that AD2D{B) tends to as i? — > 
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and to oo as -B — > CX3. By virtue of (ii), it suffices to show that AD2D(i?) is 
nondecreasing in B. 

Note that AD2D(i?) is continuous in B. Therefore, if AD2D(i?) were 
not nondecreasing in B, there would exist < Bi < B2 < oo such that 
AD2D(5i) = AD2D(S2) and AD2D(5) > AD2D(Si) = AD2D(52) for all 
Bi<B < B2. 

Consider the following renewal-theoretic argument. Let Li,L2,... and 
Ml, M2, ... be independent sequences of positive random variables having 
finite means, each of them iid. Let be the asymptotic distribution of 
the residual waiting time of the sequence {Li} (i.e., of the overshoot of the 
sequence {J2i=i -^^l^^i t,ast—f 00) and let G*^ be that of the sequence 
{Mi}. Let be the asymptotic distribution of the residual waiting time 
for the sequence {Tj} that is defined as follows: P{Ti = Li) = P(Tj = Mi) = 
1/2. By the usual renewal-theoretic apparatus, one can show that 

_ EL r-iN I EM f^M 
\A) Lt — E^_|_EAf'-^ "T EL+EAf*-" • 

(b) Let nt = min{n : Y^^=iTi > t}. The asymptotic probability (as t — s- 
00) that Tn, is of the type L,M is EL/(EL + EM), EM/(EL + EM), 
respectively. 

(c) Conditional on T„j being the type L, M the asymptotic (as t 00) 
distribution of the residual waiting time J27=i Ti — t is G^, G*^, respectively: 

lim P ( y Ti - f < x\Tn, is of type L, M | = G^, G^. 

Now, let L = A^Aflj and M = ^Ab^ ■ Note that the notation nt in terms 
of L and M is the same as Jt in terms of Na^^ and N^g^ ■ Recall that the 
procedure based on T "recycles" every time the Shiryaev-Roberts statistic 
crosses the boundary (Ab^ if the cycle has T of type NAg_^ and if the 
cycle has T of type Na^^ ) ■ Let be the value of the detection statistic at 
time t, where is a generic stopping time that is applied repeatedly. 

Let R^uHNab^) be equal to R^n'' of ([12]) for S = Bi and let i?i*^(A^AflJ 
be the same for B = i?2- To emphasize the dependence of J^, and Qj on 
the stopping time being used, we will write and Q^{j)- With this 
notation, for j = 1, 2, 

Pz. {rI < x\Q^{J^ -l)=i^-k, TjT is of type Na^^ 



= Poo {R^\Nab^) < x\Nab^ > k) ''^'F.^kix 
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Therefore, 



X Poo (q^^^i ( J.^^''' ) = - A:|TjT is of type Na^^ 
X Poo (q""^^^ {Ju^^' )=u- k\TjT is of type iV^^, 



and, by (fT5]) . 



lim ( i?^ < X 



EooiVAs^ + EooiVAs^ ^1 EooiVAs^ 

EooiVA^^ +EooiVAs^ '''^ ^ EooiVA^^ ■ 

By abuse of notation, write AD2D(A^) for the hmit (as u oo) of the 
average delay to detection when a stopping time is appHed repeatedly. It 
now follows that 

^ AD2D(Bi) + - — ^AD2D(52) 



Bi + B2 Bi + B2 

= AD2D(Bi) = AD2D(52). 

Note that EooT = IEooA^Ab^ + ^EooA^As^ = {Bi + B2)/2. By definition 
of Bi and i?2 , it follows that 

AD2D(T) < AV)2I}{Nab) = AD2D(S) for B = \{Bi + ^2), 

which contradicts (ii) for B = \{Bi + B2). □ 
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Remark 2. Theorem [2]|^iii) implies that Corohary [T] holds for all stop- 
ping times € Afi. 



Remark 3. IShirvaevI (|l963l l proved a result similar to Theorem [2|^ii) for 



Brownian motion when a change occurs in the drift, and called this problem 
"Quickest Detection of a Disorder in a Stationary Regime." 

Remark 4. It is worth noting that Theorem [2] is important in a variety 
of surveillance applications such as target detection and tracking, rapid de- 
tection of intrusions in computer networks, and environmental monitoring, 
to name a few. In all of these applications, it is of utmost importance to 
detect very rapidly changes that may occur in a distant future, in which 
case the true detection of a real change may be preceded by a long interval 
with frequent false alarms that are being filtered by a separate mechanism or 
algorithm. For example, falsely initiated target tracks are usually filtered by 
a track confirmation/deletion algorithm; false detections of attacks in com- 
puter networks in anomaly-based Intrusion Detection System s (IDS) may 



be filt e red by Signature-b a sed IDS algorithms, etc. See, e.g.. iTartakovskv 



(|l99lh : ITartakovskv et al.l {200(h : ITartakovskv and Veeravallil (120041 ). The 



practical implication of Theorem[2]is that in these circumstances one has rea- 
son to prefer the Shiryaev-Roberts procedure to other surveillance schemes. 
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